Wearable vibrotactile speech aid

ABSTRACT

A method for training vibrotactile speech perception in the absence of auditory speech can comprise selecting a first word, generating a first control signal configured to cause at least one vibrotactile transducer to vibrate against a person&#39;s body with a first vibration pattern based on the first word, sampling a second word, generating a second control signal configured to cause a vibrotactile transducer to vibrate against the person&#39;s body with a second vibration pattern based on the second word, and presenting a comparison between the first word and the second word to the person. An apparatus for training vibrotactile speech perception can comprise array of vibrotactile transducers can be in contact with the person&#39;s body. The array of vibrotactile transducers can replicate a vibration pattern based on one or more spoken words.

CROSS REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.16/643,824, filed on Mar. 2, 2020, which is a National Stage filing ofPCT Application No. PCT/US2018/049133, filed on Aug. 31, 2018, whichclaims the benefit of U.S. Provisional Patent Application No.62/553,715, filed on Sep. 1, 2017.

ACKNOWLEDGMENT OF GOVERNMENT SUPPORT

This invention was made with government support under Grant No.BCS-1439338, awarded by the National Science Foundation. The governmenthas certain rights in the invention.

FIELD

The present disclosure relates to vibrotactile speech aids and tomethods related to such devices.

BACKGROUND

Humans process speech through auditory signals. Humans can alsocommunicate through haptics. By converting auditory signals into hapticsignals, humans can learn to recognize speech through vibrotactilesensations. Proper training can improve an individual's ability torecognize speech through haptics.

SUMMARY

The foregoing and other objects, features, and advantages of theinvention will become more apparent from the following detaileddescription, which proceeds with reference to the accompanying figures.

Embodiments of a wearable vibrotactile speech aid are disclosed herein,as well as related training methods and methods of use. Theseembodiments can help improve speech recognition if used as a supplementto aural speech. Some embodiments can also enable speech perceptionwithout any complimentary aural speech recognition.

In one representative embodiment, a method for improving speechrecognition can comprise sampling a speech signal, extracting a speechenvelope from the speech signal, and generating a control signalconfigured to cause one or more vibrotactile transducer to vibrateagainst a person's body with an intensity that varies over time based onthe speech envelope. The vibration can supplement or substitute foraural and/or visual speech recognition by the person.

In some embodiments, the speech envelope can be extracted using aHilbert transform. In some embodiments, the speech envelope can beextracted using half-wave rectification and a low-pass filter. In someembodiments, the speech envelope can be extracted using a moving averagefilter.

In some embodiments, the control signal can be configured to cause thetransducer to vibrate at a constant frequency.

In some embodiments, the control signal can be a first control signalconfigured to cause a first vibrotactile transducer to vibrate and themethod can further comprise a second control signal configured to causea second vibrotactile transducer to vibrate against the person's bodywith an intensity that varies over time based on the speech envelope.

In another representative embodiment, a method for improving speechrecognition can comprise sampling a speech signal, extracting a speechenvelope from the speech signal, and generating a control signalconfigured to cause at least one of an array of vibrotactile transducersto vibrate against a person's body. The number of vibrotactiletransducers that the control signal causes to vibrate can be based onthe speech envelope. The vibration can supplement aural and/or visualspeech recognition by the person.

In some embodiments, each vibrotactile transducer that the controlsignal causes to vibrate can be caused to vibrate at a constantfrequency. In some embodiments, each vibrotactile transducer that thecontrol signal causes to vibrate can be caused to vibrate at a constantintensity.

In another representative embodiment, a method can comprise selecting afirst word, generating a first control signal configured to cause atleast one vibrotactile transducer to vibrate against a person's bodywith a first vibration pattern based on the first word, sampling asecond word spoken by the person, generating a second control signalconfigured to cause at least one vibrotactile transducer to vibrateagainst the person's body with a second vibration pattern based on thesampled second word, and presenting a comparison between the first wordand the second word to the person. An array of vibrotactile transducerscan be in contact with the person's body.

In some embodiments, the comparison can be presented to the person inthe form of auditory feedback. In some embodiments, the comparison canbe presented to the person in the form of visual feedback. In someembodiments, the comparison can be presented as a percentage. In someembodiments, the comparison can be a similarity metric between the firstword and the second word.

In some embodiments, the method can further comprise causing at leastone of the vibrotactile transducers to vibrate with the first vibrationpattern after presenting the comparison to the person, sampling a thirdword spoken by the person, generating a third control signal to cause atleast one of the vibrotactile transducers to vibrate against theperson's body with a third vibration pattern based on the sampled thirdword, presenting a comparison between the first word and the third wordto the person, and repeating the previous steps of this method if thefirst word does not match the third word.

In some embodiments, the method can further comprise generating a firstfrequency decomposition of the first word and a second frequencydecomposition of the second word. The first control signal can cause afirst vibrotactile transducer to vibrate against the person's body witha vibration pattern based on a first frequency range of the firstfrequency decomposition and can cause a second vibrotactile transducerto vibrate against the person's body with a vibration pattern based on asecond frequency range of the first frequency decomposition. The secondcontrol signal can cause the first vibrotactile transducer to vibrateagainst the person's body with a vibration pattern based on the firstfrequency range of the second frequency decomposition and can cause thesecond vibrotactile transducer to vibrate against the person's body witha vibration pattern based on the second frequency range of the secondfrequency decomposition.

In another representative embodiment, an apparatus can comprise asampling device to sample a speech signal, a signal processing module toextract a speech envelope of the sampled speech signal, a conversionmodule to convert the sampled speech envelope into a vibration pattern,and a vibrotactile transducer to vibrate against a person's body. Thevibration pattern can supplement aural and/or visual speech recognitionby the person.

In some embodiments, the vibration pattern can have a constant frequencyand an intensity that varies over time based on the speech envelope. Insome embodiments, the signal processing module can extract the speechenvelope using a Hilbert transform. In some embodiments, the signalprocessing module can extract the speech envelope using a half-waverectification and a low-pass filter. In some embodiments, the signalprocessing module can extract the speech envelope using a moving averagefilter.

In another representative embodiment, an apparatus can comprise asampling device to sample a speech signal, a signal processing module toobtain a frequency decomposition of the sampled speech signal, aconversion module to convert the frequency decomposition into a firstvibration pattern, and an array of vibrotactile transducers to vibrateagainst a person's body. The first vibration pattern can supplementaural and/or visual speech recognition by the person.

In some embodiments, the conversion module can convert a first frequencyrange of the frequency decomposition into a second vibration pattern andcan convert a second frequency range of the frequency decomposition intoa third vibration pattern. A first one of the vibrotactile transducerscan vibrate with the second vibration pattern and a second one of thevibrotactile transducers can vibrate with the third vibration pattern.

The foregoing and other objects, features, and advantages of thedisclosed technology will become more apparent from the followingdetailed description, which proceeds with reference to the accompanyingfigures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic representation of an exemplary embodiment of avibrotactile speech aid device.

FIG. 2 shows a visual representation of the operation of thevibrotactile speech aid device of FIG. 1 .

FIG. 3 is a schematic representation of another exemplary embodiment ofa vibrotactile speech aid device comprising an array of vibrotactiletransducers.

FIG. 4 is a flowchart representing an exemplary training method.

FIG. 5 is a flowchart representing another exemplary training method.

FIG. 6 shows a visual representation of the operation of thevibrotactile speech aid device of FIG. 3 in one exemplary embodiment.

FIGS. 7A and 7B show results of a study on two participants using thedisclosed vibrotactile device.

FIG. 8 is a diagram of an example computing system in which somedescribed embodiments can be implanted.

DETAILED DESCRIPTION

Humans can perceive speech and other audio signals through their senseof hearing. Some humans can also perceive speech through lip readingeither in combination with, or in the lieu of hearing a speaker's voice.Another way that humans can perceive speech with proper training isthrough haptics. In this paradigm, speech can be converted into avibration pattern and one or more vibrotactile devices can vibrateagainst a person's skin with that vibration pattern.

With proper training, an individual can learn to interpret differentvibration patterns and recognize the word or words that they correspondto. In this way, a deaf individual is able to “hear” speech through avibrotactile speech aid device. And such a vibrotactile device can alsobe used to supplement the hearing and/or speech recognition of non-deafindividuals as well. For example, someone with partial hearing losscould use such a device to supplement their hearing and/or lip reading.And a person without any hearing loss could use the device to improvespeech recognition in a noisy environment such as a bar or airport.

FIG. 1 shows an exemplary single-channel vibrotactile speech aid device100. The vibrotactile device 100 can be placed against a person's bodyto aid in speech recognition. For example, the vibrotactile device 100can be placed against a person's wrist, forearm, fingertip, tongue, orany other part of a person's body. In some examples, the vibrotactiledevice 100 can be part of a watch, wristband, or other piece of jewelryor clothing.

The vibrotactile device 100 comprises a microphone 102, a samplingdevice 104, a signal processing module 106, a conversion module 108 anda vibrotactile transducer or motor 110. The microphone 102 can compriseany type of device capable of receiving or sensing audio signals. Insome examples, the microphone 102 is omitted from the vibrotactiledevice 100 and instead an audio jack or audio interface is provided fora user to connect an external audio input or microphone to.Alternatively, in these examples a user can connect another audio sourceto the audio interface such as a digital media player or smartphone. Inthe illustrated example, the microphone 102 detects and transmits anaudio signal to the sampling device 104. In examples where an audiointerface is provided instead of a microphone, the audio interfaceconnects to the sampling device 104.

The sampling device 104 receives an audio signal from the microphone 102or from an audio interface. In the illustrated example, the samplingdevice 104 samples or digitizes the audio signal received from themicrophone 102 or audio interface, thereby creating a digital audiosignal. In other examples, the sampling device 104 can be omitted or cangenerate an analog audio signal from the audio signal received from themicrophone 102 or audio interface. In the illustrated example, thesampling device 104 transmits the sampled audio signal to the signalprocessing module 106.

In some embodiments, the signal processing module 106 can receive asampled audio signal from the sampling device 104 and can extract aspeech envelope from the sampled audio signal. A speech envelope has avalue equal to the amplitude of the audio signal at any point in time,as explained further in connection with FIG. 2 below. After the signalprocessing module 106 extracts a speech envelope, the extracted speechenvelope can be transmitted to the conversion module 108.

FIG. 2 shows a visual representation of the operation of thevibrotactile device 100. Referring to FIG. 2 , block 200 showsmicrophone 102 receiving/sensing speech from a person speaking. Block202 illustrates a waveform representation of a speech signal that can besampled by the sampling device 104. Block 204 shows the speech envelopeof the speech signal of block 202 that can be extracted by the signalprocessing module 106. The speech envelope contains sufficientinformation that speech from an audio signal can be recognized from justthe extracted speech envelope of the audio signal. A variety oftechniques can be used by the signal processing module 106 to extract aspeech envelope, including the use of a Hilbert transform, a half-waverectifier followed by a low-pass filter, or a moving average filter.Other techniques and methods of extracting a speech envelope can be usedby the signal processing module 106 as well.

Referring back to FIG. 1 , the conversion module 108 receives a speechenvelope from the signal processing module 106 and converts the speechenvelope into a vibration pattern. The vibration pattern can begenerated such that the intensity of vibration as a function of timematches the amplitude of the speech envelope as a function of time.Block 206 of FIG. 2 shows a vibrotactile transducer on a person's armand block 208 shows a plot of vibration intensity that varies over timethat matches the speech envelope of block 204. After the conversionmodule 108 converts a speech envelope into a vibration pattern, acontrol signal is transmitted to the vibrotactile transducer 110 thatcauses the vibrotactile transducer to vibrate with the vibrationpattern.

The vibrotactile transducer 110 receives the control signal from theconversion module 108 containing a vibration pattern and vibratesagainst a person's body according to the received vibration pattern. Thevibrotactile transducer 110 can be a motor or other device capable ofvibrating against a person's body and being felt by that person. In theillustrated example, the vibrotactile transducer 110 vibrates with aconstant frequency and with an intensity that varies over time based onthe speech envelope extracted by the signal processing module 106. Inother examples, the frequency of vibration of the transducer 110 canvary based on the speech envelope.

In some examples, the vibrotactile transducer 110 can be replaced withmultiple vibrotactile transducers. In these examples, the conversionmodule 108 can send a control signal that causes some number ofvibrotactile transducers to vibrate based on the speech envelope.Specifically, the number of transducers that vibrate at a given time canbe proportional to the amplitude of the speech envelope at that time.For example, when the amplitude of the speech envelope is low, only onetransducer might vibrate. When the amplitude of the speech envelope ishigh, all of the transducers might vibrate. For any amplitude of thespeech envelope in between a minimum and maximum value, some othernumber of transducers can vibrate.

In some of these examples, the number of transducers that vibrate at anygiven time is directly proportional to the amplitude of the speechenvelope at that time. In other examples, there can be a non-linearrelationship between the amplitude of the speech envelope and the numberof transducers that vibrate. For example, there can be a logarithmic orexponential relationship between the amplitude of the speech envelopeand the number of vibrating transducers. In some examples, the frequencyand intensity of the vibration of the transducers is constant no matterhow many transducers are vibrating. In other examples, the frequencyand/or intensity of the vibration of the various transducers can varybased on the speech envelope. In some of the examples where there aremultiple vibrotactile transducers as part of device 100, a personwearing the device 100 can interpret speech based on the number oftransducers that vibrate over time. In other examples where there aremultiple transducers as part of device 100, a person wearing device 100can interpret speech based on other factors such as the frequency orintensity of the vibration of one or more of the transducers.

With proper training, a person can learn to recognize words and speechbased on the vibration pattern they produce in the vibrotactile device100. Potential training methodologies are discussed in more detail laterin this disclosure. Once a person has acquired this capability, thevibrotactile device 100 can be worn by that person to either supplementor replace their audible recognition of speech. The vibrotactile device100 can operate in real time such that a person wearing the device canfeel the vibrations from the device at the same time that words arebeing spoken. As such, the vibrotactile device can supplement the speechrecognition of a person wearing the device. For example, a person withpartial hearing loss can wear the device to improve their recognition ofspeech. Or a person with normal hearing can wear the device in a noisyenvironment where speech is difficult to recognize, thereby improvingtheir ability to recognize speech in such an environment. Alternatively,the device can be used as the sole source of speech recognition withoutsupplementing aural speech recognition. It should be understood that aspeech aid as disclosed herein can be used to either supplement orreplace aural speech recognition.

FIG. 3 shows an exemplary multi-channel vibrotactile device 300. Similarto the single channel vibrotactile device 100, the multi-channelvibrotactile device 300 can be placed against a person's body to improvespeech recognition as described herein or can be a part of jewelry orclothing worn by the person. The vibrotactile device 300 comprises amicrophone 302, a sampling device 304, a signal processing module 306, aconversion module 308, and an array of vibrotactile transducers 310.

The microphone 302 can be similar to the microphone 102 of FIG. 1 . Themicrophone 302 can receive/detect audio and transmit the detected audioto the sampling device 304. In some examples, the microphone 302 can beomitted and replaced with an audio interface that can be used to connectany microphone or other audio input source.

The sampling device 304 can be similar to the sampling device 104 ofFIG. 1 . The sampling device 304 can sample audio detected by themicrophone 302 and transmit sampled audio to the signal processingmodule 306.

The signal processing module 306 can receive a sampled audio signal fromthe sampling device 304 and obtain a frequency decomposition of thesampled audio signal received from the sampling device 304. That is, thesignal processing module 306 can convert the time-domain audio signalinto a frequency domain signal. This can be accomplished by using aFourier transform algorithm or any other suitable technique. Thefrequency decomposition thus contains all of the spectral informationthat makes up the audio signal. After obtaining a frequencydecomposition, the signal processing module 306 can transmit thefrequency decomposition of the sampled audio signal to the conversionmodule 308.

The conversion module 308 can convert the frequency decompositionreceived from the signal processing module 106 to a vibration patternand send a control signal to the transducers in the vibrotactiletransducer array 310 to vibrate according to the vibration pattern. Inthe illustrated example, the array 310 comprises two rows of seventransducers each. However, in other examples, the array 310 can compriseany number of transducers in any arrangement. When the vibrotactiledevice 300 is worn by a person, for example on the person's forearm, thevibration of each transducer in the array 310 can be felt on a differentpart of that person's forearm or whatever part of their body they arewearing the device. This allows the wearer of the device to feel thevibration of each transducer in the array 310 individually.

In the illustrated example, the conversion module 308 generates avibration pattern such that each transducer of the array 310 vibrateswith an intensity that corresponds to a different band or frequencyrange of the received frequency decomposition. In the illustratedexample, each of the 14 transducers represents a portion of the soundwave spectrum that is audible to humans, which is around 20 Hz to 20,000Hz. For example, one transducer may correspond to frequencies between 20Hz and 100 Hz, another transducer may correspond to frequencies between100 Hz and 200 Hz, and so on. In operation, the conversion module 308causes each transducer in the array 310 to vibrate with a constantfrequency but with an intensity proportional to the average amplitude ofa particular corresponding frequency band in the received frequencydecomposition signal. In other examples, the signal processing module306 and the conversion module 308 can utilize any other algorithm toconvert the sampled audio signal into a vibration pattern with which thetransducers of the vibrotactile transducer array 310 can vibrate.

With proper training, a person can learn to recognize spoken words basedon the vibration felt while wearing either the single-channelvibrotactile device 100 or the multi-channel vibrotactile device 300.Two potential training methods are now be discussed that can train anindividual to use the vibrotactile device 100, 300. Each training methodis described below with respect to a person wearing vibrotactile device300 but it should be understood that either training method can also bedone with a person wearing vibrotactile device 100 or with any othervibrotactile device capable of turning speech into vibration patterns.

The first training method is passive vibrotactile speech training. Inthis training method, a person wears one of the vibrotactile devices100, 300 and is presented with speech either from an external source orgenerated by the person wearing the device. As this speech is heard bythe person wearing the device, they also feel the vibration patterncorresponding to the speech being heard in real time. This allows thedevice wearer to learn how different words and sounds feel. In oneexample, the person wearing the device can speak themselves and feel thevibration pattern that different sounds produce. This can allow thewearer of the device to try out different sounds and train themselves byspeaking sounds they have difficulty recognizing until they are betterable to detect them.

FIG. 4 is a flowchart representative of the operation of a vibrotactiledevice during this passive training method. FIG. 6 shows a visualrepresentation of the operation of the vibrotactile device during thistraining method. FIG. 4 begins when a training subject (i.e., a personwearing the vibrotactile device) speaks a word or sound and this speechsignal is sampled (block 400). In the illustrated example, samplingdevice 304 samples a word or sound spoken into microphone 302. Block 600of FIG. 6 shows microphone 302 receiving/sensing speech from a personspeaking and block 602 shows a waveform representation of a speechsignal that can be sampled by the sampling device 304.

After a speech signal is sampled, a vibration pattern is generated thatcorresponds to the sampled word (block 402). In the illustrated example,conversion module 308 generates a vibration pattern after receiving afrequency decomposition signal from the signal processing module 306based on the sampled speech signal. Block 604 of FIG. 6 illustrates avibration pattern generated from a frequency decomposition algorithm asdiscussed above. In other examples, the sampled word can be recognizedby an auditory speech recognition algorithm and a stored vibrationpattern corresponding to the recognized word can be selected.

After the vibration pattern is generated or selected, one or moretransducers are vibrated against the body of the training subject (block404). In the illustrated example, the transducers of vibrotactiletransducer array 310 vibrate according to the vibration patterngenerated by conversion module 308. This results in the training subjectfeeling the vibration pattern corresponding to a word or sound as theyare speaking and/or hearing it. Block 606 of FIG. 6 shows fourvibrotactile transducers on a person's arm that can vibrate based on thegenerated vibration pattern. In other examples, any number ofvibrotactile transducers can be used.

This process can then be repeated any number of times with differentwords or sounds. The training subject can be encouraged to speakparticular words that they have had difficulty identifying in the pastor that are known typically to be difficult to identify based onvibration patterns. Over time, the training subject should begin torecognize the vibration pattern of different words and sounds.

A study was performed using the above described passive training methodto demonstrate the feasibility of the disclosed vibrotactile device. Twonormal-hearing participants performed a speech-in-noise identificationtask both with and without a vibrotactile device. Stimuli for the studyincluded twelve vowel-consonant-vowel syllables (e.g., aba) embedded inwhite noise. Noise levels were calibrated for each participant using anadaptive staircase procedure so that accuracy for syllableidentification was about 60%. Participants completed 3 blocks of 120trials each. For one block of trials, no vibration was paired with theauditory stimuli. For another block, the auditory stimuli was pairedwith a vibration pattern on the vibrotactile device corresponding to thespeech envelope of the stimuli. And for the last block, the auditorystimuli were paired with a control vibration of constant intensity. Theorder of the blocks was randomized for each participant.

For participant 1, the vibrotactile speech envelope delivered by thevibrotactile device improved speech perception by 39.6% relative to theno vibration condition and 17.5% relative to the control vibrationcondition. These results are illustrated in FIG. 7A, which shows theaccuracy of speech recognition for participant 1 for each of the threeblocks. For participant 2, the vibrotactile speech envelope delivered bythe vibrotactile device improved speech perception by 14% relative tothe no vibration condition and 12.5% relative to the control vibrationcondition. These results are illustrated in FIG. 7B, which shows theaccuracy of speech recognition for participant 2 for each of the threeblocks. These results demonstrate the ability of the disclosedvibrotactile device to improve speech recognition in noisy environments.

Another training method comprises active production-perception speechtraining. In this training method, a training subject is activelypresented with the vibrotactile representation of a certain word andthen attempts to speak the word they were presented with. Thevibrotactile presentation of the word can be repeated until the personcorrectly speaks the word back.

FIG. 5 is a flowchart representative of the operation of a vibrotactiledevice during this active training method. FIG. 5 begins when a targetword or sound is selected (block 500). The target word could be acomplete word, a partial word, a syllable, a sound, or a particularphonetic feature to be trained. The target word could be selectedrandomly, it could be selected because it contains a particular sound orset of sounds that are to be trained or that someone has had difficultywith in the past, or could be selected for any other reason.

Once a target word is selected, a vibration pattern corresponding to thetarget word is generated (block 502). The vibration pattern can begenerated by having a person or computer speak the target word, byplaying an audio recording of the target word, or the vibration patterncan be generated without the target word being spoken or played. Inexamples where the target word is spoken or a recording of the targetword is played, the target word can be spoken or played into microphone302 and the vibration pattern can be generated as described above withrespect to device 300 of FIG. 3 . In these examples, it is importantthat the training subject not be able to hear the word as this woulddefeat the training. In some examples, the training subject can wearearplugs, noise cancelling headphones, or other devices thatartificially restrict the training subject's ability to hear. In otherexamples, a vibration pattern can be generated without the word actuallybeing spoken or played. In these examples, a digital version of the word(e.g., a WAV file) can be transmitted directly to signal processingmodule 306 or vibration patterns for various target words can simply bestored electronically.

After a vibration pattern is generated, one or more vibrotactiletransducers are vibrated based on the vibration pattern (block 504).This causes the training subject to feel the vibration patterncorresponding to a particular word. The training subject can then try toguess what word corresponds to the vibration pattern they felt and saythat word. This spoken word is then sampled by the vibrotactile device(block 506) and a new vibration pattern is generated in real time basedon this spoken word (block 508). The transducers of the device are thenvibrated based on this new vibration pattern corresponding to the wordspoken by the training subject (block 510). This allows the trainingsubject to feel the vibration pattern of the word that they spoke andallows them to compare this vibration to the vibration of the targetword.

After the transducers of the device vibrate with a vibration patternbased on the word spoken by the training subject, the training subjectis presented with a comparison in the form of a similarity metricbetween the target word and the spoken word. In some examples, thesimilarity metric is presented visually. In other examples, thesimilarity metric is presented audibly. The similarity metric is ameasure of how similar the spoken word is to the target word and allowsthe training subject to see how close their guess was to the actualtarget word. In some examples, the similarity metric is based on thenumber or characters or phenomes that are different between the spokenword and the target word. In some examples, the similarity metric ispresented as a percentage. In some examples, the similarity metric isdetermined by projecting the target word and the spoken word intomultidimensional space and then determining a Euclidian distance betweenthe two signals.

After the transducers of the device vibrate with a vibration patternbased on the spoken word, a determination is made as to whether thespoken word matches the target word (block 512). That is, adetermination is made as to whether the training subject correctlyguessed the target word. If the spoken word does not match the targetword, control returns to block 504 and the transducers again vibratewith a vibration pattern based on the target word. The remaining stepsof FIG. 5 are then repeated with the training subject repeatedlyguessing what the target word is until they guess correctly. Once theguess word spoken by the training subject matches the target word (block514), the method of FIG. 5 ends. By continually prompting the trainingsubject to guess the target word, the subject is encouraged to exploreand produce a range of words and sounds. By presenting the trainingsubject with a similarity metric for each guess, the subject will learnover time to identify words based on their vibration pattern.

Example Computing Systems

FIG. 8 depicts a generalized example of a suitable computing system 800in which the described innovations may be implemented. The computingsystem 800 is not intended to suggest any limitation as to scope of useor functionality, as the innovations may be implemented in diversegeneral-purpose or special-purpose computing systems.

With reference to FIG. 8 , the computing system 800 includes one or moreprocessing units 810, 815 and memory 820, 825. In FIG. 8 this basicconfiguration 830 is included within a dashed line. The processing units810, 815 execute computer-executable instructions. A processing unit canbe a general-purpose central processing unit (CPU), processor in anapplication-specific integrated circuit (ASIC) or any other type ofprocessor. In a multi-processing system, multiple processing unitsexecute computer-executable instructions to increase processing power.For example, FIG. 8 shows a central processing unit 810 as well as agraphics processing unit or co-processing unit 815. The tangible memory820, 825 may be volatile memory (e.g., registers, cache, RAM),non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or somecombination of the two, accessible by the processing unit(s). The memory820, 825 stores software 880 implementing one or more innovationsdescribed herein, in the form of computer-executable instructionssuitable for execution by the processing unit(s).

A computing system may have additional features. For example, thecomputing system 800 includes storage 840, one or more input devices850, one or more output devices 860, and one or more communicationconnections 870. An interconnection mechanism (not shown) such as a bus,controller, or network interconnects the components of the computingsystem 800. Typically, operating system software (not shown) provides anoperating environment for other software executing in the computingsystem 800, and coordinates activities of the components of thecomputing system 800.

The tangible storage 840 may be removable or non-removable, and includesmagnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any othermedium which can be used to store information in a non-transitory wayand which can be accessed within the computing system 800. The storage840 stores instructions for the software 880 implementing one or moreinnovations described herein.

The input device(s) 850 may be a touch input device such as a keyboard,mouse, pen, or trackball, a voice input device, a scanning device, oranother device that provides input to the computing system 800. Forvideo encoding, the input device(s) 850 may be a camera, video card, TVtuner card, or similar device that accepts video input in analog ordigital form, or a CD-ROM or CD-RW that reads video samples into thecomputing system 800. The output device(s) 860 may be a display,printer, speaker, CD-writer, or another device that provides output fromthe computing system 800.

The communication connection(s) 870 enable communication over acommunication medium to another computing entity. The communicationmedium conveys information such as computer-executable instructions,audio or video input or output, or other data in a modulated datasignal. A modulated data signal is a signal that has one or more of itscharacteristics set or changed in such a manner as to encode informationin the signal. By way of example, and not limitation, communicationmedia can use an electrical, optical, RF, or other carrier.

The innovations can be described in the general context ofcomputer-executable instructions, such as those included in programmodules, being executed in a computing system on a target real orvirtual processor. Generally, program modules include routines,programs, libraries, objects, classes, components, data structures, etc.that perform particular tasks or implement particular abstract datatypes. The functionality of the program modules may be combined or splitbetween program modules as desired in various embodiments.Computer-executable instructions for program modules may be executedwithin a local or distributed computing system.

The terms “system” and “device” are used interchangeably herein. Unlessthe context clearly indicates otherwise, neither term implies anylimitation on a type of computing system or computing device. Ingeneral, a computing system or computing device can be local ordistributed, and can include any combination of special-purpose hardwareand/or general-purpose hardware with software implementing thefunctionality described herein.

For the sake of presentation, the detailed description uses terms like“determine” and “use” to describe computer operations in a computingsystem. These terms are high-level abstractions for operations performedby a computer, and should not be confused with acts performed by a humanbeing. The actual computer operations corresponding to these terms varydepending on implementation.

General Considerations

For purposes of this description, certain aspects, advantages, and novelfeatures of the embodiments of this disclosure are described herein. Thedisclosed methods, apparatus, and systems should not be construed asbeing limiting in any way. Instead, the present disclosure is directedtoward all novel and nonobvious features and aspects of the variousdisclosed embodiments, alone and in various combinations andsub-combinations with one another. The methods, apparatus, and systemsare not limited to any specific aspect or feature or combinationthereof, nor do the disclosed embodiments require that any one or morespecific advantages be present or problems be solved.

Although the operations of some of the disclosed embodiments aredescribed in a particular, sequential order for convenient presentation,it should be understood that this manner of description encompassesrearrangement, unless a particular ordering is required by specificlanguage set forth below. For example, operations described sequentiallymay in some cases be rearranged or performed concurrently. Moreover, forthe sake of simplicity, the attached figures may not show the variousways in which the disclosed methods can be used in conjunction withother methods. Additionally, the description sometimes uses terms like“provide” or “achieve” to describe the disclosed methods. These termsmay be high-level descriptions of the actual operations that areperformed. The actual operations that correspond to these terms may varydepending on the particular implementation.

As used in this application and in the claims, the singular forms “a,”“an,” and “the” include the plural forms unless the context clearlydictates otherwise. Additionally, the term “includes” means “comprises.”Further, the terms “coupled” and “associated” generally meanelectrically, electromagnetically, and/or physically (e.g., mechanicallyor chemically) coupled or linked and does not exclude the presence ofintermediate elements between the coupled or associated items absentspecific contrary language.

As used herein, operations that occur “simultaneously” or “concurrently”occur generally at the same time as one another, although delays in theoccurrence of one operation relative to the other due to, for example,spacing, play or backlash between components in a mechanical linkagesuch as threads, gears, etc., are expressly within the scope of theabove terms, absent specific contrary language.

Any of the disclosed methods can be implemented as computer-executableinstructions or a computer program product stored on one or morecomputer-readable storage media and executed on a computing device(e.g., any available computing device, including smart phones or othermobile devices that include computing hardware). Computer-readablestorage media are any available tangible media that can be accessedwithin a computing environment (e.g., one or more optical media discssuch as DVD or CD, volatile memory components (such as DRAM or SRAM), ornonvolatile memory components (such as flash memory or hard drives)). Byway of example and with reference to FIG. 7 , computer-readable storagemedia include memory 720 and 725, and storage 740. The termcomputer-readable storage media does not include communicationconnections (e.g., 770) such as signals and carrier waves.

Any of the computer-executable instructions for implementing thedisclosed techniques as well as any data created and used duringimplementation of the disclosed embodiments can be stored on one or morecomputer-readable storage media (e.g., non-transitory computer-readablemedia). The computer-executable instructions can be part of, forexample, a dedicated software application or a software application thatis accessed or downloaded via a web browser or other softwareapplication (such as a remote computing application). Such software canbe executed, for example, on a single local computer (e.g., any suitablecommercially available computer) or in a network environment (e.g., viathe Internet, a wide-area network, a local-area network, a client-servernetwork (such as a cloud computing network), or other such network)using one or more network computers.

For clarity, only certain selected aspects of the software-basedimplementations are described. Other details that are well known in theart are omitted. For example, it should be understood that the disclosedtechnology is not limited to any specific computer language or program.For instance, the disclosed technology can be implemented by softwarewritten in C++, Java, Perl, JavaScript, Adobe Flash, or any othersuitable programming language. Likewise, the disclosed technology is notlimited to any particular computer or type of hardware. Certain detailsof suitable computers and hardware are well known and need not be setforth in detail in this disclosure.

Furthermore, any of the software-based embodiments (comprising, forexample, computer-executable instructions for causing a computer toperform any of the disclosed methods) can be uploaded, downloaded, orremotely accessed through a suitable communication means. Such suitablecommunication means include, for example, the Internet, the World WideWeb, an intranet, software applications, cable (including fiber opticcable), magnetic communications, electromagnetic communications(including RF, microwave, and infrared communications), electroniccommunications, or other such communication means.

The disclosed methods, apparatus, and systems should not be construed aslimiting in any way. Instead, the present disclosure is directed towardall novel and nonobvious features and aspects of the various disclosedembodiments, alone and in various combinations and sub combinations withone another. The disclosed methods, apparatus, and systems are notlimited to any specific aspect or feature or combination thereof, nor dothe disclosed embodiments require that any one or more specificadvantages be present or problems be solved. Any feature of anyembodiment can be combined with any other disclosed feature in anycombination.

In view of the many possible embodiments to which the principles of thedisclosed technology may be applied, it should be recognized that theillustrated embodiments are only preferred examples of the disclosedtechnology and should not be taken as limiting the scope of thedisclosed technology. Rather, the scope of the disclosure is at least asbroad as the following claims. We therefore claim all that comes withinthe scope of these claims.

We claim:
 1. A method for improving speech recognition comprising:sampling a speech signal; extracting a speech envelope from the speechsignal; and generating a control signal configured to cause one or morevibrotactile transducer to vibrate against a person's body with anintensity that varies over time based on the speech envelope such thatthe vibration supplements aural or visual speech recognition by theperson.
 2. The method of claim 1, wherein the speech envelope isextracted using a Hilbert transform.
 3. The method of claim 1, whereinthe speech envelope is extracted using a half-wave rectification and alow-pass filter.
 4. The method of claim 1, wherein the speech envelopeis extracted using a moving average filter.
 5. The method of claim 1,wherein the control signal is configured to cause the transducer tovibrate at a constant frequency.
 6. The method of claim 1, wherein thecontrol signal is a first control signal configured to cause a firstvibrotactile transducer to vibrate, and further comprising generating asecond control signal configured to cause a second vibrotactiletransducer to vibrate against the person's body with an intensity thatvaries over time based on the speech envelope.
 7. A method for improvingspeech recognition comprising: sampling a speech signal; extracting aspeech envelope from the speech signal; and generating a control signalconfigured to cause at least one of an array of vibrotactile transducersto vibrate against a person's body, wherein the number of vibrotactiletransducers that the control signal causes to vibrate is based on thespeech envelope, and wherein the vibration supplements aural or visualspeech recognition by the person.
 8. The method of claim 7, wherein eachvibrotactile transducer that the control signal causes to vibrate iscaused to vibrate at a constant frequency.
 9. The method of claim 7,wherein each vibrotactile transducer that the control signal causes tovibrate is caused to vibrate at a constant intensity.
 10. An apparatuscomprising: a sampling device to sample a speech signal; a signalprocessing module to extract a speech envelope of the sampled speechsignal; a conversion module to convert the sampled speech envelope intoa vibration pattern; and a vibrotactile transducer to vibrate against aperson's body with the vibration pattern to supplement aural and/orvisual speech recognition by the person.
 11. The apparatus of claim 10,wherein the vibration pattern has a constant frequency and an intensitythat varies over time based on the speech envelope.
 12. The apparatus ofclaim 10, wherein the signal processing module extracts the speechenvelope using a Hilbert transform.
 13. The apparatus of claim 10,wherein the signal processing module extracts the speech envelope usinghalf-wave rectification and a low-pass filter.
 14. The apparatus of anyof claim 10, wherein the signal processing module extracts the speechenvelope using a moving average filter.